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Background of the Invention: 

5 

Coffee contains a highly comply mixttu-e of flavour molecules. Extensive research on the 
compose of tost™, and fresh ground coffee beverages has, to date, identified more than 
850 compounds, many of which are flavour active molecules (Flament, I (2002) Coffee 
FlnvorChemisiry.JohnWiieyandSons.Tac). However, few of me final coffee flavour 
mo.ecu.es found in me enp are present in the raw materia., me green grain (green heans) of 
tite plan. spec.es Coffea ar abi ca or Coffea canepHora (mbusta) . fc fect> „,„ ^ „ ^ 
coffee flavour compounds are generated during one or more of me mu.tip.e processing steps 

*» "» ° f *° ^ ^ cherries m me fina. roaL ground cof^T 

product, or extracts thereof, for example soluble coffee products. 

1 5 

The various steps in me production of coffee are described in Smith, A.W., in Coffee- 
Volume . : Chemistry pp Ml, dark, rj. md Macrea, R. eds, Elsevier Applied Science 
london and New Yoric, 19 g5; Clarice, R.J., in Coffee: Botany, Biochemistry, and Production 
of B^ns and Beverage, pp 230-250 andpp 375-393; and Oifford, M.N. and Wiflson, KC 
^aoomHehnUd, London Briefly, me process ster* with me collection of mature, ripe 
red chemes. The outer layer, or pericarp, can men,* removed using either me dryorwe, 
process. The dry process is the simples, arm involves i) Cassation and washing of me 
cherry 2, drying me ehetries after grading (eimer air drying or mechanic* drying and 3) 
^ustagftednedcherrieato^e^^^ The we. process is^tiynj 
» -Plated, and „y .eads te the production of higher ^ green beans. The wl 
P^s. more often associated wim C. arattca cherries. The we. process consists of .) 
Caseation of me cherries, 2) p„ lping otthe oherrira . ^ ^ fe done ^ ^ ^ 

3rC^^r ~ ° f ^ " * -tine cherries, 

3) fermenUtion , m. muclage ma. remains attached to me grain of me cherries after 
,. P^Pi^femoved by aUowing ttre grain p lns attached mucilage ,o be ineubated wim water 
usmg a bateh proems. The "fomentation" process is auowed to continue up to 80 

l~;f r? houra is generauy -* 10 ^ - . 

r r ^ 6 ' 8 " 6 - 9 10 4 ' 2 " 4 - 6 - to Vari ° US «*«- and 

me metabohc action of microorganisms which gmw during .he fermentation, 4) drying tins 



step involves either air or mechanical hot air drying of the fermented coffee grain and 5) 
"hulling", this step involves the mechanical removal of the "parch" of the dried coffee grain 
(dried parchment coffee) and often the silverskin is also removed at this stage. After wet or 
dry processing, the resulting green coffee grain are often sorted, with most sorting procedures 
5 being based on grain size and/or shape. 

The next step in coffee processing is the roasting of the green grain after dehusking or 
dehulling of dry or wet processed coffee, respectively. This is a time-dependent process 
which induces significant chemical changes in the bean. The first phase of roasting occurs 
10 when the supplied heat drives out the remaining water in the grain. When the bulk of the 

water is gone, roasting proper starts as the temperature rises towards 190-200°C. The degree 
of roasting, which is usually monitored by the colour development of the beans, plays a major 
role in determining the flavour characteristics of the final beverage product. Thus, the time 
and temperature of the roasting are tightly controlled in order to achieve the desired coffee 
flavour profile. After roasting, the coffee is ground to facilitate extraction during the 
production of the coffee beverage or coffee extracts (the latter to be used to produce instant 
coffee products). Again, the type of grinding can influence the final flavour of the beverage. 
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While a considerable amount of research has been carried out on the identification of the 
flavour molecules in coffee, much less work has been done regarding the physical and 
chemical reactions which occur within the coffee grains during each of the processing steps. 
This latter point is particularly evident for the roasting reaction, where the large number of 
grain constituents undergo an extremely complex series of heat induced reactions (Homma, 
S. 2001, m "Coffee: Recent Developments". R.J. Clarke and O.G. Vitzthum eds, Blackwell 
Science, London; Yereteian, C, et al ((2002) Eur. Food Res. Technol. 214, 92-104; Flament, 
I (2002) Coffee Flavor Chemistry, John Wiley and Sons, UK; Reineccius, G.A., "The 
Maillard Reaction and Coffee Flavor" Conference Proceedings of ASIC, 16 th Colloque, 
Kyoto, Japan 1995). Because of the large number of the potential reactants, and the 
complexity of the reactions involved, little research data currently exist which establishes 
strong links between specific biological molecules present in the green beans and the 
numerous flavour molecules identified in a cup of coffee. 

Nonetheless, while the details of most of the reactions that occur during the different steps of 
coffee processing remain relatively unclear, it is thought that an important flavour generating 
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reaction responsible for many of the flavours associated with coffee aroma is the "Maillard" 
reaction during coffee roasting. A vigorous Maillard reaction occurs between the grain 
reducing sugars/polysaccharide degradation products and the amino group containing 
molecules (particularly the proteins, peptides, and amino acids) during the roasting step. 

Because the Maillard reaction apparently makes an important contribution to the generation 
of coffee flavour and aroma molecules during coffee roasting, there might be an association 
between the levels of primary Maillard reactants in the green beans and the quality of the 
flavour/aroma developed after roasting. 



As noted above, an important group of substrates in the Maillard reaction are the amino acids, 
peptides and proteins. To date, there are no firm associations between the levels, or types, of 
amino group containing molecules in different coffees and the qualities of those coffees. 
Although the amino acid contents of arabica and robusta coffee beans have been analysed 

15 (M.N. Clifford 1985, In "Coffee: Botany, Biochemistry, and Production of Beans and 

Beverage", Clifford and Willson eds, Croom Helm London; Arnold et al 1996, Z. Lebensm 
Unters Forsch., Vol 199, 22-25; Ludwig et al 2000, Eur. Food Res Technol., Vol 211, 111- 
1 16.), there are no reports directly linking specific levels or ratios of amino acids and high or 
low flavour qualities. Using 2-D electrophoresis, it has been shown that differences exist in 

20 the levels and amounts of the major storage proteins in arabica and robusta green coffee 

beans - however, no association between these storage protein differences and flavour quality 
was noted (Rogers et al, 1999, Plant Physiol. Biochem. Vol 37, 261-272). It has also recently 
been found that small differences exist between the storage proteins of immature and mature 
coffee beans, which have different flavour qualities (Montavon, P. et al, 2003, J. Agric and 

25 Food Chemistry Vol 51, 2328-2334). However, because there are many changes occurring 
during seed maturation, this latter work only suggests a link may exist between the quality 
improvement caused by maturation and the differences seen in the 2-D gel patterns of the 
main coffee storage proteins. Overall, currently no clear evidence exists linking any 
differences seen for the coffee storage proteins, or other major green bean proteins, and the 

30 flavour qualities of coffee. 

It has recently been shown that there are differences in the profiles of peptides isolated from 
arabica and robusta green beans (Ludwig et al 2000, Eur. Food Res Technol., Vol 21 1, 1 1 1- 
1 16.). Although their results showed that the arabica and robusta peptide extracts differ in 
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their aroma precursor profile, the data presented in this report do not identify which 
component(s) in the extracts is / are responsible for these aroma profile differences. These 
workers also detected at least two different proteinase activities in crude extracts of the green 
coffee, but they did not correlate any specific activities with aroma/flavour quality (Ludwig et 
al 2000, Eur. Food ResTechnol., Vol 211, 111-116). Finally, it is also thought that the very 
high temperatures used during the later stages of green coffee grain roasting cause substantial 
cleavage of the proteins present in the coffee grain (Homma, S. 2001, In "Coffee: Recent 
Developments". R.J. Clarke and O.G. Vitzthum eds, Blackwell Science, London; Montavon, 
P., et al 2003, "Changes in green coffee protein profiles during roasting", J. Agric. Food 
Chem. 51, 2335-2343). However, the overall scheme for this protein degradation is very 
poorly understood, but presumably depends on, among other things, the precise state of the 
main coffee proteins in the raw material before the start of roasting. To our knowledge, there 
are no other significant reports addressing the possibility that peptide profiles in coffee'could 
be involved in the production of coffee aroma/flavour. 

In the roasting of the fermented seeds of Theobroma cacao (cocoa beans), there would appear 
to be an involvement of seed amino acids and peptides in the development of Maillard 
reaction aromas/flavours. Relative to other seeds, T. cacao seeds have been shown to have an 
unusually high level of aspartic proteinase activity (Biehl, B., Voigt, J., Voigt, G., Heinrichs 
H., Senyuk, V. and Bytof, G. (1994) « P H dependent enzymatic formation of oligopeptides 
and amino acids, the aroma precursors in raw cocoa beans" In The Proceedings of the 11 th 
International Cocoa Research Conference, 18-24 July 1993, Yamoussoukro, Ivory Coast). 
In order to produce cocoa beans with a high level of cocoa flavour precursors, it is necessary 
to carry out a natural fermentation step (unfermented beans develop little flavour when 
roasted). During this fermentation step, the sugars in the pulp are fermented, generating high 
levels of acids, particularly acetic acid (Carr, J.G. (1982) Cocoa. In Fermented Foods. 
Economic Microbiology. Vol 7. pages 275-292. (A.H. Rose ed). Academic Press). As the 
fermentation continues, the pH in the seed decreases and the cell structure becomes disrupted. 
The low pH triggers the abundant cacao seed aspartic proteinase to become mobilized and/or 
activated, resulting in a massive degradation of cellular protein (Biehl, B., Passern, D and 
Sagemann, W. (1982) "Effect of Acetic Acid on Subcellular Structures of Cocoa Bean" 
Cotylydons". J. Sci. Food Agric. 33, 1 101-1109; Biehl., B., Brunner, E., Passern, D., 
Quesnel, V.C., and Adomako, D. (1985) "Acidification, proteolysis and flavour potential in 
fermenting cocoa beans". J. Sci. Food Agric. 36, 583-598). Peptides and amino acids have 
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been shown to be cocoa flavour precursors ( Rohan, T. (1964) "The precursors of chocolate 
aroma: a comparative study of fermented and unfermented cocoa beans". J. Food Sci., 29, 
456-459; Voigt, J. and Biehl, B. (1995) "Precursors of the cocoa specific aroma components 
are derived from the vicilin-class (7S) globulin of the cocoa seeds by proteolytic processing". 
5 Bot. Acta 108, 283-289). Thus, the T. cacao seed aspartic proteinase, together with a seed 
serine carboxypeptidase, have been proposed to be critical for the generation of cocoa flavour 
precursors during fermentation (Voigt, J. and Biehl, B. (1995) "Precursors of the cocoa 
specific aroma components are derived from the vicilin-class (7S) globulin of the cocoa seeds 
by proteolytic processing". Bot Acta 108, 283-289; Voigt, J., Heinrichs, H., Voigt, G. and 

10 Biehl, B. (1994) "Cocoa-specific aroma precursors are generated by proteolytic digestion of 
the vicilin-like globulin of cocoa seeds". Food Chemistry, 50, 177-184.) The gene encoding 
the abundant cacao seed aspartic proteinase has been identified and a method to over-express 
this protein in cacao seeds which can generate increased levels of cacao flavour precursor 
amino acids and peptides in fermented cocoa beans has recently been described in 

15 International Patent Publication No. 02/04617, the whole contents of which are incorporated 
herein by reference. However, the teaching of International Patent Publication No. 02/04617 
is directed towards cacao seeds, which undergo a specific long acid fermentation step, unlike 
coffee grains which do not. 

20 Objects of the Invention 

It is an object of the present invention to improve the flavour quality of coffee. 

More specifically, it is an object of the present invention to improve the levels of the flavour 
25 precursors in the raw material (the green grain) so that, following post harvest treatment and 
roast-processing, an improved flavour is achieved. Without being bound by theory, it is 
believed that, if there are variations in the levels of peptides and protein degradation between 
coffees with significantly different flavours, then it is possible that these variations could be 
due to differences in the endogenous proteinase activities in these different grains. This 
30 difference might be detectable at the level of mRNA expression by variations in the levels of 
expression for particular seed proteinase genes. 

Statements of the Invention 
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The present invention involves, therefore, identifying gene sequences encoding for coffee 
grain (seed) specific proteinases and showing that there are indeed variations in the 
expression of these genes in arabica and robusta. 

More specifically, the present invention discloses a major coffee cysteine proteinase (CcCP- 
1), a major coffee cysteine proteinase inhibitor (CcCPI-1) and coffee aspartic proteinases 
(CcAP-1 and CcAP-2), all of which are expressed in coffee seeds. We further show how 
either over-expression of these proteins specifically late in seed development, or the reduced 
expression of these proteins during late seed development, can alter the amino 
acid/peptide/protein profile of the mature beans. By using one or more of the disclosed gene 
sequences and gene constructs to alter the amino acid/peptide/protein profile of the mature 
beans, we disclose a new method to alter the flavour precursor profile of mature coffee beans, 
and thus thereby allow the production of roasted coffee beans with altered flavours. 

In a first aspect, the present invention provides an isolated polynucleotide comprising a 
nucleotide sequence encoding a polypeptide having cysteine proteinase activity, wherein the 
amino acid sequence of the polypeptide and the amino acid sequence ofSEQ ID No. 2 have 
at least 70%, preferably at least 80%, sequence identity based on the ClustalW alignment 
method; or the complement of the nucleotide sequence, wherein the complement contains the 
same number of nucleotides as the nucleotide sequence, and the complement and the 
nucleotide sequence are 100% complementary. Preferably, the amino acid sequence of the 
polypeptide and the amino acid sequence of SEQ ID No. 2 have at least 85%, preferably at 
least 90«/o, optionally at least 95%, sequence identity based on the ClustalW alignment 
method. Preferably, the nucleotide sequence comprises the nucleotide sequence of SEQ ID 
No. 1. Preferably, the polypeptide comprises the amino acid sequence of SEQ ID No. 2. 

hi a second aspect, there is provided an isolated polynucleotide comprising a nucleotide 
sequence encoding a polypeptide having cysteine proteinase inhibitor activity, wherein the 
amino acid sequence of the polypeptide and the amino acid sequence of SEQ ID No. 4 have 
at least 70%, preferably at least 80%, sequence identity based on the ClustalW alignment 
method; or the complement of the nucleotide sequence, wherein the complement contains the 
same number of nucleotides as the nucleotide sequence, and the complement and the 
nucleotide sequence are 100% complementary. Preferably, the amino acid sequence of the 
polypeptide and the amino acid sequence of SEQ ID No. 4 have at least 85%, preferably at 
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least 90%, optionally at least 95%, sequence identity based on the ClustalW alignment 
method. Preferably, the nucleotide sequence comprises the nucleotide sequence of SEQ ID 
No. 3. Preferably, the polypeptide comprises the amino acid sequence of SEQ ID No. 4. 

In a third aspect, there is provided an isolated polynucleotide comprising a nucleotide 
sequence encoding apolypeptide having aspartic endoproteinase activity, wherein the amino 
acid sequence of the polypeptide and the amino acid sequence selected from SEQ ID No. 6 or 
8, preferably SEQ ID No. 8, have at least 75%, preferably at least 80%, sequence identity 
based on the ClustalW alignment method, or the complement of the nucleotide sequence, 
wherein the complement contains the same number of nucleotides as the nucleotide sequence, 
and the complement and the nucleotide sequence are 100% complementary. Preferably, the ' 
amino acid sequence of the polypeptide and the amino acid sequence selected from SEQ ID 
No. 6 or 8, preferably SEQ ID No. 8, have at least 85%, preferably at least 90%, optionally at 
least 95%, sequence identity based on the ClustalW alignment method. Preferably, the 
nucleotide sequence comprises me nucleotide sequence of SEQ ID No. 5 or 7, preferably 
SEQ ID No. 7. Preferably, the polypeptide comprises the amino acid sequence of SEQ ID 
No. 6 or 8, preferably SEQ ID No. 8. 

In a further aspect, there is provided a vector comprising the polynucleotide of any one of 
20 first to third aspects of the invention. 

In a further aspect, there is provided a recombinant DNA construct comprising the 
polynucleotide of any one of first to third aspects of the invention, operably linked to a 
regulatory sequence. 
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In a further aspect, there is provided a method for transforming a cell comprising 
fransforming the cell with the polynucleotide of any one of first to third aspects of the present 



invention. 



30 In a further aspect, there is provided a cell comprising the aforementioned recombinant DNA 
construct, which cell is preferably a prokaryotic cell, an eukaryotic cell or a plant cell, 
preferably a coffee cell. 

In a further aspect, there is provided a transgenic plant comprising such a transformed cell. 
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In the present application, coffee cherry terms are defined as follows: coffee cherry; entire 
fruit; exocarp, skin; pericarp, fleshy major outer layer of cherry; and grain, coffee seed. For a 
fuller explanation of thee terms, reference is made to Clarke, R.J., in Coffee: Botany, 
Biochemistry, and Production of Beans and Beverage, pp 230, Clifford, M.N. and Willson, 
K.C. eds, Croom Helm Ltd, London, the contents of which are incorporated in their entirety. 

Brief Desc ription of the Invention 

The invention can be understood from the following detailed description and the 
accompanying Sequence Listing which forms part of the present application. 

Table 1 hereunder lists the polypeptides that are described herein, along with the 
corresponding sequence identifier (SEQ ID No) as used in the attached listing. 

Table 1: 



SEQ ID No 1 ( CcCPl : Cysteine proteinase, nucleic acid and its corresponding amino acid) 
SEQ ID No 2 ( CcCPl : Cysteine proteinase, amino acid) 

SEQ ID No 3 ( CcCPI-1 : Cysteine proteinase Inhibitor, nucleic acid and its corresponding 
amino acid) 

SEQ ID No 4 ( CcCPI-1 : Cysteine proteinase Inhibitor, amino acid) 

SEQ ID No 5 ( CcAPl : Aspartic endoproteinase 1, nucleic acid and its corresponding amino 
acid) 

SEQ ID No 6 ( CcAPl : Aspartic endoproteinase 1, amino acid) 

SEQ ID No 7 ( CcAP2 : Aspartic proteinase 2, nucleic acid and its corresponding amino 
acid) 

SEQ ID No 8 ( CcAP2 : Aspartic proteinase 2, amino acid) 

The sequence listing employs the one letter codes for nucleotide sequence characters and the 
three letter codes for amino acids as defined for IUPAC-IUBMB Standards and as described 
in Nucleic Acids Research 13:3021-3030 (1985), which is incorporated herein by reference. 
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Drawing s 



In the drawings, 



5 Figure 1 shows a Northern blot analysis of cysteine proteinase gene in different tissues of 
Coffea arabica, in which the lanes are labeled R: root, S: stem, L: young leaves; and SG, LG, 
Y and Red are grain from small green fruit, large green fruit, yellow fruit and red fruit, 
respectively. Five micrograms of total RNA was loaded in each lane. MW is an RNA size 
ladder. Panel A illustrates an autoradiography after 24 hours exposure and Panel B 
10 demonstrates the etbidium bromide staining of the gels prior to blotting; 

Figure 2 shows a Northern blot analysis of Cysteine proteinase inhibitor gene in different 
tissues of Coffea arabica, in which the lanes are labeled R: root, S: stem, L: young leaves and 
SG, LG, Y and Red for grain from small green fruit, large green fruit, yellow fruit and red 
15 fruit, respectively. Five micrograms of total RNA was loaded in each lane. MW is an RNA 
size ladder. Panel A illustrates an autoradiography after 24 hours exposure and panel B 
demonstrates the ethidium bromide staining of the gels prior to blotting; 



20 



25 



Figure 3 shows a Northern blot analysis of Cysteine proteinase inhibitor gene in different 
stages of development of Coffea arabica (ARA) and Coffea robusta (ROB) fruit. The lanes 
are labeled small green fruit (SG), large green fruit (LG), yellow fruit (Y) and red fruit (Red), 
respectively. Five micrograms of total RNA was loaded in each lane. MW is an RNA size 
ladder. Panel A illustrates an autoradiography after 24 hours exposure. Panel B 
demonstrates the ethidium bromide staining of the gels prior to blotting; and 
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Figure 4 shows a Northern blot analysis of aspartic proteinase 2 (CcAP2) gene in different 
tissues of Coffea arabica, in which the lanes are labelled R root, S: stem, L: young leaves F- 
flowers; SG(G) and (P), LG(G) and (P), Y(G) and (P) and Red(G) and (P) are for grain and ' 
for pericarp, respectively, from small green, large green, yellow and red cherries, and SG(G), 
LG(G), Y(G) and R(G) for pericarp from small green, large green, yellow and red cherries 
respectively. Five micrograms of total RNA was loaded in each lane. Panel A demonstrates 
the ethidium bromide staining of large ribosomal RNA prior to blotting as a loading control 
and panel B is an autoradiogram showing the appearance of the CcAP2 mRNA in the specific 
tissues tested. 



Detailed Description of the Invention 



As used herein, a "polynucleotide" is a nucleotide sequence such as a nucleic acid fragment. 
A polynucleotide may be a polymer of RNA or DNA that is single- or double-stranded, that 
optionally contains synthetic, non-natural or altered nucleotide bases. A polynucleotide in 
the form of a polymer of DNA may comprise one or more segments of cDNA, genomic 
DNA synthetic DNA or mixtures thereof. 

Similar nucleic acid fragments are characterised, in the present invention, by the percent 
identity of the amino acid sequences that they encode, to the amino acid sequences disclosed 
herein, as determined by algorithms commonly used by those skilled in the art. Suitable 
nucleic acid fragments (or isolated polynucleotides of the first to third aspects of the present 
invention) encode polypeptides that are at least 70% identical, preferably at least 80% 
identical, to the amino acid sequences disclosed herein. Preferred nucleic acid fragments 
encode amino acid sequences that are at least 85% identical to the amino acid sequences 
disclosed herein. More preferred nucleic acid fragments encode amino acid sequences that 
are at least 90% identical to the amino acid sequences disclosed herein. Still more preferred 
are nucleic acid fragments that encode amino acid sequences that are at least 95% identical to 
the amino acid sequences disclosed herein. Multiple ahgnment of sequences should be 
performed using the ClustalW method of alignment (Thompson et al, 1994, Nucleic Acids 
Research, Vol 22, p4673-4680; Higgins & Sharp 1989 Cabios. 5:151-153). 

As used herein, the term "similar nucleic acid fragments" refers to polynucleotide sequences 
m which changes in one or more nucleotide bases result in substitution of one or more amino 
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acids, but which changes either do not affect the function of the polypeptide encoded by the 
nucleotide sequence or do not affect the ability of nucleic acid fragment to mediate gene 
expression by gene silencing via, for example, antisense or co-expression technology. The 
term "similar nucleic acid fragments" also refers to modified polynucleotide sequences, in 
which one or more nucleotide bases is / are deleted or inserted, provided that the 
modifications do either do not affect the function of the polypeptide encoded by the 
nucleotide sequence or do not affect the ability of nucleic acid fragment to mediate gene 
expression by gene silencing. It will, therefore, be understood that the scope of the present 
invention extends beyond the polynucleotide and polypeptide sequences specifically 
1 0 disclosed herein. 

Similar nucleic acid fragments may be selected by screening nucleic acid fragments in the 
form of subfragments or modified nucleic acid fragments, for their ability to affect the level 
of the polypeptide encoded by the unmodified nucleic acid fragments in the plant or plant 
15 cell. 



The term "operably linked" refers to the association of two or more nucleic acid fragments on 
a single nucleic acid fragment so that the function of one is affected by the other. 
"Regulatory sequences" refer to nucleotide sequences located upstream, within, or 
downstream, of a coding sequence and which influence transcription, RNA processing or 
stability, or translation of the coding sequence associated therewith. Regulatory sequences 
may include promoters, translation leader sequences, introns and polyadenylation recognition 
sequences. When a regulatory sequence in the form of a promoter is operably linked to a 
coding sequence, the regulatory sequence is capable of affecting the expression of the coding 
sequence. Coding sequences can be operably linked to regulatory sequences in sense or 
antisense orientation. 

The term "expression" refers to the transcription, and stable accumulation, of sense RNA 
(mRNA) or antisense RNA derived from the nucleic acid fragments of the present invention. 
Expression may also refer to the translation of mRNA into a polypeptide. Overexpression 
refers to the production of a gene product in a transgenic cell, that exceeds the level of 
production in normal, or non-transformed, cells. "Altered levels" refers to the production of 
gene produces) in a transgenic cell in amounts or proportions that differ from that of normal, 
or non-transformed, cells. 
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"Transformation" refers to the transfer of a nucleic acid fragment into the genome of a host 
cell, resulting in genetically stable inheritance. Host cells containing the transformed nucleic 
acid fragments are referred to herein as "transgenic cells. 

5 

Standard recombinant DNA and molecular cloning techniques as used herein are well known 
in the art and are described more fully in Sambrook et al "Molecular Cloning: A Laboratory 
Manual"; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, 1989, which is 
incorporated herein by reference. 

10 

Examples 

The following Examples illustrate the invention without limiting the invention to the same. 
In the examples, all parts and percentages are by weight and degrees are in Celsius, unless 
15 this is otherwise specified. 

In the following Examples, these abbreviations have been used: 
PCR : Polymerase chain reaction 
RACE : Rapid amplification cDNA ends 

20 

From the above discussion and the Examples below, those skilled in the art can ascertain the 
essential features of the present invention, and without departing from the scope thereof can 
make various changes and modifications thereto, to adapt it to various usages and conditions 
as desired. 

25 

Production of cDNA libraries and screening 
Production of Seed Specific RNA 

Coffee cherries of the Robusta variety Q121 were harvested 30 WAF (weeks after flowering) 
30 at the ICCRI, Indonesia. The pericarps of these cherries were then removed and the 
remaining perisperm/endosperm material was frozen and ground to a powder in liquid 
nitrogen. The RNA was extracted from the frozen powder material using the method 
described previously for the RNA extraction of cacao seeds (Guilloteau, M. et al, 2003, Oil 
bodies in Theobroma cacao seeds: cloning and characterisation of cDNA encoding the 15.8 
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and 16.9 kDa oleosins. Plant Science Vol 164, 597-606). Poly A + RNA was prepared from 
approximately 250ug total RNA using the "PolyA Purist™" kit of AMBION (manufactured 
by Ambion, Inc.) according to their kit instructions. 

» Production of First Set of Seed cDNA clones 

Approximately 50-100ng of this poly A + RNA was then employed in the synthesis of the first 
strand cDNA using "Superscript™ H RNase H" reverse transcriptase (GIBCOBRL™)and 
the SMART™ PCR cDNA synthesis kit (Clontech) as follows. A reaction containing 2^1 of 
30 WAF poly A + RNA, 1 \\L CDS oligo (SMART™ PCR cDNA kit, Clontech), 1 uL Smart 
1 10 II oligo (SMART™ PCR cDNA kit, Clontech), and 8 uL deionised H 2 0. This mixture was 
heated to 72°C for 5 minutes and then placed on ice. Then the following was added; 1 uL 10 
mM dNTPs, 4 pL SuperScriptH™ 1 st stand buffer and 2 uL DTT. This mixture was put at 
42°C for 2 minutes then 1 uL of SuperScriptH™ RNaseH" reverse transcriptase (200 units/pL 
GIBCO BRL™) was added and the mixture was incubated in an air circulating incubator at 
1 5 42°C for a further 50 minutes. 



After the reverse transcription reaction, the following PCR reaction was carried out. 98 »iL of 
the Master Mix described in the SMART™ PCR cDNA kit (Clontech) containing 
Advantage™ 2 polymerase (Advantage™ 2 PCR kit, ClonTech) was set up on ice and then 3 
20 n.L of the 1 st strand cDNA synthesis reaction described above was added. This 100 \£L PCR 
reaction was then placed in a MJ Research PTC-1 50 HB apparatus and the following PCR 
conditions were run: 95°C for 1 minute, then 16 cycles of 95°C for 15 seconds, 65°C for 30 
seconds, 68°C for 6 minutes. The amplified DNA was purified using the Strataprep™ PCR 
Purification Kit (Stratagene) according to the suppliers' instructions. The DNA which was 
eluted in 50 uL deionized water, was then "polished" using the Pru-1 polymerase reagents 
contained in the PCR-Script™ Amp cloning kit (Stratagene) as follows; 50 pL DNA, 5 jiL 10 
mM dNTPs, 6.5 uL 10 x Pfu-1 polishing buffer, 5 uL cloned Pfu-1 DNA polymerase (0.5 
U/ul). This reaction was then incubated at 72°C for 30 minutes in a PCR apparatus with a 
heated cover (Perkin Elmer). Using the protocol described in the pPCR-Script™ Amp kit 
(Stratagene), the polished (blunted) PCR products were ligated into the Srf-1 digested pPCR- 
Script™ Amp SK(+) vector in the presence of Srf-1 enzyme and the ligation reaction 
products were transformed into the XL-10 Gold™ Kan ultracompetent E. coli cells. 
Selection for transformation with plasmids containing inserts was done using LB-Amp plates 
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and IPTG and Xgal spread on the surface as described in the pPCR-Script™ Amp kit. White 
colonies were selected and the clones were named Davl-1 etc. 

Production of Second Set of seed cDNA clones with Size Selected cDNA 
5 Seeds highly express a small number of proteins, such as the seed storage proteins (White et 
al, 2000, Plant Physiology, Vol 124, 1582-1594). When cDNA is prepared from such tissue, 
the very high level of the storage proteins and other seed specific proteins leads to a high 
level of cDNA "redundancy", that is, the population of cDNA produced contains high 
proportions of the same cDNA. In order to reduce the redundancy of cDNA made from 

10 coffee seed mRNA, and to selectively characterise long and weakly expressed cDNA, a 

second cDNA cloning strategy was also used. Using the products of the reverse transcriptase 
reaction described above, the following PCR reactions was set up using the Advantage™ 2 
PCR kit (ClonTech): 3 uL of the reverse transcriptase reaction, 5 pL 10 x Advantage™ 2 
PCR buffer, 1 pL dNTP's (10 mM each), 2 pL PCR primer (SMART™ PCR cDNA kit, 

1 5 Clontech), 39 pL deionised water, and 1 yCL 50 x Advantage™ 2 polymerase mix. This PCR 
reaction was then placed in a MJ Research PTC-150 HB apparatus and the following PCR 
conditions were run: 95°C for 1 minute, then 16 cycles of 95°C for 15 seconds, 65°C for 30 
seconds, 68°C for 6 minutes. At the end of the PCR, 1 pL 10% SDS was added with gel 
loading buffer, the sample was heated to 37°C for ten minutes. The sample was then split for 

20 loading onto a 0.7% agarose gel without ethidium bromide: 10% was loaded into a small well 
beside a DNA marker lane and the other 90% was loaded into a neighbouring large, 
preparation scale well. After the gel was run, the gel section with the size markers, plus the 
10% reaction sample, were stained with ethidium bromide. This stained gel section was then 
used as a template to generate gel slices containing PCR amplified cDNA of different sizes 

25 from the cDNA present in the remaining unstained (preparation) part of the gel. Six gel 

slices were generated having the indicated size range of PCR fragments; A1A (0.8- lkb), A1B 
(1-1.5 kb), A2 (1.5-2.25 kb), A3 (2.25-3.25), A4 3.25-4 kb), and A5 (4-6.5 kb). 

The DNA in each gel slice was eluted from the agarose using the QIAEX E kit from Qiagen 
30 following the suppliers instructions (for samples 3A, 4A, and 5A were heated for 10 minutes 
at 50°C and 1 A, IB, and 2A were heated for 10 minutes at room temperature). The purified 
double stranded cDNA was then re-amplified further by PCR with a TAQ enzyme mix which 
makes fragments having a 3' T overhang as follows: 30 pL of the gel isolated double 
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stranded cDNA, 5 uL 10 x TAQ buffer (supplied with TAQ PLUS precision polymerase mix, 
Stratagene), 1 uL 40 mM dNTP's (each 10 mM), 2 uL PCR primer (SMART™ PCR cDNA ' 
kit, Clontech), 0.5 uL TAQ PLUS precision polymerase mix (Stratagene) and 1 1 .5 uL 
deionised water. The PCR reaction conditions were as follows: 95°C for 1 minute then 7 
cycles 95°C for 15 seconds, 65°C for 1 minute, 72°C for 8 minutes, then 1 cycle at 95°C for 
15 seconds, 65°C for 1 minute, 72°C for 10 minutes. 

The PCR amplified DNA produced was then ligated into the vector pCR™-TOPO™ and 
cloned into TOP10 E. coli cells using the TOPO™ TA kit (Invitrogen) as described by the 
supplier. The clones were named by their order of isolation and their position in the sizing 
gel (for example, A2-1, A2-2, etc.). 

Seed cDNA Screening and Preliminary Identification 

The first set of white colonies obtained in Dav-1 library were screened by first determining 
the size of each insert by PCR amphfying the insert using the primers T3 and 17 which flank 
the cloning site used and examining the PCR amplified fragments on a gel. 

Each white colony were resuspended in 200 ul sterile water and 10-30 ul of this was added to 
5 ul 10X Taq polymerase buffer (Stratagene), 1 ul 10 mM dNTP mix, 2.5 ul 20 uM T3 
primer, 2.5 ul 20 uM T7 primer, 1 ul DMSO, 0.5 ul Taq polymerase (Stratagene), and H 2 0 
up to 50 ul final volume). The PCR reaction program uses was 94°C for 1 min, then 30 
cycles of 94<>C for 1 min, 55°C for 1.5 min and 3.5 min at 72°C, and a final cycle of 7 min at 
72°C. To reduce redundancy, the PCR inserts of similar size were subjected to digestion by 
the restriction enzyme Hae HI. Those PCR fragments with the same Hae HI restriction 
pattern were not studied further. The plasmids of clones with PCR fragments >500 bp and 
which had unique Hae EH restriction patterns were then purified by using the Qiawall 8 ultra 
plasmid kit (Qiagen) for 5' end dideoxy sequencing using the appropriate T7 or T3 
sequencing primers coded in the flanking vector sequences. Because the inserts were not 
cloned in a directed fashion, it was first necessary to determine the 5' end of each clone by a 
Seal digestion of the purified plasmid DNA (the CDS SMART primer contains a Sea 1 site 
allowing the orientation of the insert to be determined). The DNA sequence data obtained 
was subsequently blasted against the non-redundant database protein in GENEB ANK to 
obtain a preliminary annotation of each cDNA clone using the program BLASTX™. 
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Seed cDNA banks have a high level of redundancy. That is, a small number of seed mRNA 
have an unusually high level of expression, such as those encoding the seed storage proteins, 
and therefore their cDNA are very abundant in seed cDNA banks (White et al t 2000, Plant 
5 Physiology, Vol 124, 1582-1594). Therefore, as soon as the main redundant cDNA's were 
identified in the first round of sequencing the coffee seed cDNA a pre-screening step was 
added for the white insert containing colonies prior to the determination of insert size. Four 
sequences were very highly expressed and the following specific primers sets were made for 
each of these redundant sequences, 
10 1) 2S protein, contig 8A 5' AGCAACTGCAGCAAGGTGGAG 3' and contig 8B 5' 
CGATTTGGCACTGCTGTGGTTC 3' (55°C used in PCR, 1 14 bp fragment), 

2) 2S protein contig 15A 5' GCCCGTGCTCCTGAACCA 3' and contig 15B 5' 
GTATGGTTGCGGTGGCTGAA 3' (55°C used in PCR.256 bp fragment), 

3) Oleosin 15.5 contig 30A 5' ACCCCGCTTTTCGTTAT 3' and contig 30B 

15 TCTGGCTACATCTTGAGTTCT 3' (55°C used in PCR, 261 bp fragment), and 

4) 1 IS protein contig 37A 5' GTTTCCAGACCGCCATCAG 3' and contig 37B 5' 
ATATCCATCCTCTTCCAACACC 3* (59°C used in PCR, 261 bp fragment). 

The PCR reactions for this prescreen step were run as follows: 10-30 ul of the white colony 
20 in sterile H 2 0, 5 ul 10X Taq buffer (Stratagene), 1 ul 10 mM dNTP, 2.5 ul of each primer at 
20 uM, 1 ul DMSO, 0.5 ul Taq polymerase (Stratagene lOU/ul) and sterile H a O was added to 
produce a final reaction total volume of 50 ul. The PCR program was 1 min at 94°C, then 30 
cycles of 1 min at 94°C, 1.5 min at specific temperature for each primer pair, 2.5 min at 
72°C, followed by 7 min at 72°C. 

25 

Full Length c DNA Insert Seauencins and Sequence Analysis 
cDNA clones whose partial sequences showed initial homologies to proteinases and 
proteinase inhibitors were fully sequenced on both strands using the standard dideoxy primer 
walking strategy. The sequences are shown under SEQ ID Nos. 1, 3, 5 and 7. The full length 
30 sequences obtained were again blasted against the GenBank non redundant protein database 
using BLASTX to reinforce the preliminary annotation 
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Sequence identities of sequence pairs were calculated using the ClustalW™ program 
contained in the MegAlign™ module of the Lasergene™ software package (DNASTAR Inc). 
The default parameters were chosen as follows: (1- MULTIPLE ALIGNMENT 
PARAMETERS - Gap penalty 15.00, Gap length penalty 6.66, Delay divergent Seqs (%) 30, 
DNA transition weight 0.5, Protein Weight Matrix-Gonnet Series, DNA Weight Matrix IUB. 
2- PAIRWISE ALIGNMENT PARAMETERS-Slow/Accurate (Gap Penalty 15.00, Gap 
Length Penalty 6.66), Protein Weight Matrix-Gonnet 250, DNA Weight Matrix-RJB) and the 
sequences used were either the full length nucleotide sequence of each cDNA or the full ORF 
(open reading frame) of each cDNA. 

TABLE 2 

Identity values between the nucleic acid and amino acid sequences of CcCP-1, CcCPI-1, and 
CcAP-2 and related genes found in the non-redundant protein database of GeneBank and 
those of WO 02/04617. 



cDNA Sequences 


nucleotide identity (%) 


protein identity (%) 
(ORF) 


CcAPl vrs TcAPl 


2.9 


13.3 


CcAPl vrs TcAP2 


2.4 


9.8 


CcAP2 vrs TcAPl 


55.0 


61.5 


CcAP2 vrs TcAP2 


55.1 


61.3 


CcCP-1 vs Arabidopsis thaliana 
putative cysteine proteinase 
(AY070063) 


51.8 


64.3 


CcCP-1 vs Glycine max cysteine 
endopeptidase (Z32795) 


49.1 


61.3 


CcCP-1 vs Vicia sativa cysteine 
proteinase precursor (Z99172) 


49.0 


60.9 


CcAP2 vs Lycopersicon 
esculentum aspartic proteinase 
precursor (L46681) 


65.9 


71.1 


CcAP2 vs Ipomoea batatas 
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putative aspartic proteinase 
mKNA (AF259982) 


71.7 


69.6 


CcAP2 vs Nepenthes alata 
NaAP4 mRNA for aspartic 
proteinase 4 (AB045894) 


58.4 


66.5 


CcCPI-1 vs Mains x domestica 
cystatin (AY176584) 


38.8 


45.5 



Northern-Blot Analysis 

5 

Freshly harvested roots, young leaves, stem, flowers and fruit at different stages of 
development (small green fruit (SG), large green fruit (LG), Yellow fruit (Y) and red fruit 
(R)) were harvested from Coffea arabica var. Catura T-2308 grown under greenhouse 
conditions (25°C, 70 RH) in Tours, France, and from Coffea canephora var. BP409 grown 

10 either in Equador or ICCRI, Indonesia. The fresh tissues were frozen immediately in liquid 
nitrogen and total RNA was isolated from each tissue using the extraction procedure 
described above. A total of 5 ug of RNA was run on a 1.2% (w/v) denaturing RNA gel 
containing formaldehyde. The total RNA samples from each plant tissue were heated at 65°C 
for 15 min in presence of 7 jiL "RNA Sample Loading Buffer" (without ethidium bromide, 

15 Sigma), and then put immediately on ice for 2 minutes before being loaded onto the 1 .2% 
RNA gel. The gels were run at 60 Volts for 5 hours. The gel was then soaked twice in 10* 
SSC for 20 min. The RNA in the gel was transferred overnight by capillary transfer to a 
"Positive TM Membrane" (Qbiogene) in 10* SSC and the RNA was fixed by heating the blot 
for 30min at 80°C. Probes were generated using "Rediprime™ II random prime labelling 

20 system" kit (Amersham) in the presence of (P 32 ) dCTP. Hybridisation was carried out at 
65°C for 24 h in hybridisation solution (5X SSC, 40ug/ml Denatured Salmon Sperm DNA, 
5% [w/v] SDS, and 5x Denhardt's solution). Then, the membrane was washed twice at 65°C 
using 2X SSC, 0.1% SDS [w/v] and IX SSC, 0.1% SDS [w/v] during 30 minutes each. 

25 5'RACEPCR 
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The cDNA insert of clone A5-812 was found to contain introns. Therefore, to confirm the 
coding sequence of this protein, it was necessary to isolate a new cDNA containing the 
complete coding sequence. This was accomplished by using the SMART™ RACE cDNA 
amplification Kit (Clontech). The first strand cDNA used for the 5' RACE was made as 
already described for the cDNA libraries above. A gene specific primer rAP2 (5' 
CATATAATATTAAAAGCACCACCCATAA 30 was designed - this sequence is situated 
92 pb from the poly (A) tail of A5-812 clone. This specific primer was then used with the 
Umversal Primer Mix (UPM) in the CLONTECH kit in a PGR reaction under the following 
conditions; 2.5 ul of first strand cDNA product, 5 ul of 1 OX Advantage 2 PCR Buffer 
(CLONTECH), 1 m of dNTP Mix (10 mM), 1 M l of 50X Advantage 2 Polymerase Mix 
(CLONTECH), 5 ul of Universal Primer A Mix" (10X) (CLONTECH), 1 ul of rAP2 (10 
MM) and sterile water was added to a final volume of 50 ul. PCR cycling conditions were 20 
cycles of 30 sec at 94°C, 30 sec at 68°C and 3 min at 72°C, followed by a final extension 
reaction for 5 min at 72'C. A fragment of about 1700 pb was obtained, excised from the gel 
using "CONCERT^ Rapid Gel Extraction kit" (GibcoBRL). The isolated fragment was 
cloned in the pCR 4-TOPO vector and transformed into Escherichia coli using the Topo-TA 
cloning kit (Invitrogen). The plas^d obtamed was men purified usmg a plasnrid extract 
kit (QIAfilter Plasmid Midi Kit, Qiagen, France) and the insert of this plasmid was double 
strand sequenced. 

The DNA of clone A5-442 (API) was found to lack the 5' region of the cDNA. To isolate 
this region a 5' RACE was performed using the SMART™ RACE cDNA amplification Kit 
(Clontech). A sequence specific primer rAPl (5'- 

TGGAGTCACAAGATGTCTCGACGAACTG-30 situated at 396 pb from the poly (A) tail 
was designed. This specific primer was then used with the Universal Primer Mix (UPM) in 
the CLONTECH kit in a PCR reaction under the following conditions; 2.5 ul of first strand 
cDNA,5 pi of 10XAdvantage2PCRBuffer(CLONTECH), 1 m! of dNTP Mix (10 mM) 1 
Ml of SOX Advantage 2 Polymerase Mix (CLONTECH), 5 m1 of "Universal Primer A Mix" 
(10X) (CLONTECH), 1 m1 of rAPl, and sterile water was added to a final volume of 50 ul 
PCR cycling conditions were 20 cycles of 30 sec at 94°C, 30 sec at 68°C and 3 min at 72°C 
followed by a final extension reaction for 5 min at 72°C. A fragment of about 2,000 bp was 
obtained, excised fromthe gel using "CONCERT™ Rapid Gel Extraction kit" (GibcoBRL) 
The isolated fragment was cloned in the pCR 4-TOPO vector and transformed into 
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Escherichia coli using the Topo-TA cloning kit (Invitrogen). The plasmid obtained was then 
purified using a plasmid extraction kit (QIAfilter Plasmid Midi Kit, Qiagen, France) and the 
insert of this plasmid was double strand sequenced. 

The Northern blot analysis shown in Figure 1 demonstrates that the coffee cysteine proteinase 
gene CcCP-1 gene is expressed in the C. arabica coffee cherry at all the stages tested, with 
yellow cherries exhibiting slightly higher levels of expression than the other stages. No 
expression was detected for this gene in the root, stem or leaves of C. arabica. 

The Northern blot analysis shown in Figure 2 demonstrates that the coffee cysteine proteinase 
inhibitor gene CcCPI-1 gene is expressed in the C. arabica coffee cherry at all stages tested. 
However, in contrast to the expression seen for the cysteine proteinase CcCP-1, CcCPI-1 
exhibits higher expression in the two early stages of coffee cherry development (small green 
and large green), and this gene is expressed at lower levels in the two later stages of cherry 
development. This expression pattern is consistent with the present hypothesis that the 
cysteine proteinase inhibitor protein (CcCPI-1) controls the activity level of the cysteine 
proteinase CcCP-1 in the coffee cherry. A controlling protein such as the cysteine proteinase 
inhibitor protein can be expected to be expressed earlier than its target protein if it is 
necessary to control the level of activity of its target protein continuously from the time that 
the target protein is expressed. No expression was detected for this gene in the root, stem or 
leaves of C. arabica. It is noted that the similarity of the expression patterns for CcCP-1 and 
CcCPI-1 are consistent with the present hypothesis that these proteins interact functionally. 

The Northern blot analysis shown in Figure 3 demonstrates that the coffee cysteine proteinase 
mmbitorgeneCcCPI-1 gene is expressed differently in the cherries of C. canephora (robusta) 
versus the cherries of C. arabica. First, the data of Figure 3 shows that the CcCPI-1 gene is 
expressed slightly earlier in C. arabica. Secondly, and more importantly, the CcCPI-1 gene 
is expressed in significantly higher levels in the C.canephora cherries. This difference in 
expression probably affects the level of the cysteine proteinase activity found in C. arabica 
versus C. canephora cherries. Because this class of protein is widely associated with insect 
resistance m plants, it is also likely that the high expression of the CcCPI-1 gene in C 
canephora contributes to the higher disease resistance often seen for robusta varieties versus 
arabica varieties. 
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The Northern Mot analysis shown in Figure 4 demonstrates that the coffee aspartic proteinase 
CcAP-2 gene is expressed in hoth the grain and the pericarp of the C. arabica coffee cherry at 
all cherry development stages tested. The CcAP-2 gene also has a relatively high expression 
in roots. When the film is exposed longer, CcAP-2 expression was also detected in the 
5 tissues of C. arabica stems, leaf, and flowers. 

Overexpression and under-expression of the CcCP-1, CcAP-1 and CcAP-2 proteinase gene 
sequences and the CcCPI-1 proteinase inhibitor in coffee seeds. 

It is expected that the major storage protein profile and the amino acid/peptide profile can be 
10 changed in the mature coffee grain by altering, either up or down, the expression of one or 
more of the genes disclosed herein. 

Methods for the overexpression of a gene of interest are well known in the art. Such methods 
consist of creating a chimeric gene of three major components, 1) a promoter sequence at the 
5' end of the gene, preferably in the current application a seed specific promoter such the 
coffee seed specific promoter described in Marrachini et al. 1999 (Marraccini et al 1999 
Molecular cloning of the complete 1 IS seed storage protein gene of Cqffea arabica and 
promoter analysis in transgenic tobacco plants, Plant Physiol. Biochem. Vol 37, 273-282, and 
WO 99/02688), 2) the entire coding sequence of the gene to be expressed, and 3) a 3' control 
region such as the 3' region from the nopaline synthase gene from the T-DNA of the Ti 
plasmid of Agrobacterium tumefaciens. Then, the chimeric gene can be cloned into an 
Agrobacterium tumefaciens transformation vector, and this vector can be transformed into an 
Agrobacterium tumefaciens strain for use in coffee transformation as described by Leroy et al 
2000, (Leroy et al 2000 Genetically modified coffee plants expressing the Bacillus 
thuringiensis crylAc gene for resistance to leaf minor. Plant Cell Reports 2000, 19, 382-389). 
Plants with stable transformation inserts can then be screened for those which overexpress the 
specific genes used in the transformation experiment specifically in mature seeds using 
methods such as detection of gene overexpression or protein activity overexpression versus 
seeds from mock transformed plants. 
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It is well known in the art that the expression of known gene sequences can be reduced or 
completely blocked by antisense suppression and of gene expression using nucleic acid 
fragments representing less than the entire coding region of a gene, and by nucleic acids that 
do not share 100% sequence identity with the gene to be suppressed. In this case, the 
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sequences chosen for the particular antisense suppression or cosuppression experiment will 
replace the foil length gene in the chimeric gene construction scheme presented above. The 
resulting antisense suppression or cosuppression chimeric constructions are again cloned into 
an Agrobacterium tumefaciens transformation vector, and transformed into Agrobacterium 
tumefaciens strain for use in coffee transformation as described above. Plants with stable 
transformation inserts can then be screened for those with reduced expression of the specific 
gene sequences used in the seeds of the transformed plants. The reduced expression can be 
detected by techniques such as northern blotting ;semi quantitative RT-PCR, and/or 
quantitative RT-PCR. 



Another method for reducing, or eliminating, the expression of a gene in plants is to use the 
small portions of the gene sequences disclosed herein to produce RNA silencing via using 
RNAi (Harmon, G.J., 2002, Nature, Vol 418, 244-251; Tang etal, 2003, Genes Dev, Vol 17, 
49-63). In this approach, small regions of one or more of the sequences disclosed herein are 

15 cloned into an Agrobacterium tumefaciens transformation vector as described above which 
has a seed specific promoter and an appropriate 3' regulatory region. This new inserted 
sequence for RNAi should be constructed so that the RNA produced forms an RNA structure 
invivo which result in the production of small double stranded RNA in the transformed cells 
and whereby these small double stranded RNA sequences trigger the degradation of the 

20 homologous mRNA in these transformed cells. 

Screening for naturally occurring variations in the CcCP-1, CcAP-1, CcAP-2, CcCPI-1 
genes and creating new mutations in these genes. 

The sequences disclosed herein can be used to screen natural populations for allelic variants 
25 in these genes. This can be accomplished by using the CcCP-1, CcAP-2, CcCPI-1 sequences 
as probes in a search for naturally occurring RFLP's (restriction fragment length 
polymorphisms) in genomic DNA from different coffee plant varieties. A more powerful 
method to find allelic variants is to use the mutation screening technology associated with the 
TILLING method (Till, B.J., et al 2003 Large scale discovery of induced point mutations 
30 with high-thruput TILLING. Genome Research Vol 13, 524-530). In this case, once a 
specific gene sequence has been isolated and cloned, such as CcCP-1, CcAP-2, CcCPI-1 
sequences herein, the mutation screening technique associated with the TILLING method can 
be used to identify sequence variants between the cloned sequence and the corresponding 
cDNA or genomic sequence in different varieties. Using PCR primer pairs coding for DNA 
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segments of 700-1 100 base pairs, the known cloned gene can be scanned for naturally 
occurring sequence variations in different varieties. In the ideal situation, one or more 
sequence variants could also be correlated with a particular phenotypic variation thereby 
identifying a genetic marker for this phenotypic variant. 

Additionally, using the sequences disclosed herein for CcCP-1, CcAP-2 and CcCPI-1, 
application of the full TILLING method can be used to create and detect new mutants in 
these genes and thus produce plants containing these specific mutants. For example, using 
the full TILLING method, coffee plants could be created which have specific mutations, such 
as a missense mutation in the coding sequence which inactivates the gene target of interest. 
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CLAIMS: 



1 . An isolated polynucleotide comprising a nucleotide sequence encoding a polypeptide 
5 having cysteine proteinase activity, wherein the amino acid sequence of the polypeptide 

i and the amino acid sequence of SEQ ID No. 2 have at least 70%, preferably at least 80%, 

sequence identity based on the ClustalW alignment method; or the complement of the 
nucleotide sequence, wherein the complement contains the same number of nucleotides as 
the nucleotide sequence, and the complement and the nucleotide sequence are 100% 
10 complementary. 

2. The polynucleotide of Claim 1, wherein the amino acid sequence of the polypeptide and 
the amino acid sequence of SEQ ID No. 2 have at least 85%, preferably at least 90%, 
optionally at least 95%, sequence identity based on the ClustalW alignment method 

15 

3. The polynucleotide of Claim I, wherein the nucleotide sequence comprises the nucleotide 
sequence of SEQ ID No. 1. 

4. The polynucleotide of Claim 1, wherein the polypeptide comprises the amino acid 
20 sequence of SEQ ID No. 2. 

5. An isolated polynucleotide comprising a nucleotide sequence encoding a polypeptide 
having cysteine proteinase inhibitor activity, wherein the amino acid sequence of the 
polypeptide and the amino acid sequence of SEQ ID No. 4 have at least 70%, preferably 

25 at least 80%, sequence identity based on the ClustalW alignment method; or the 

complement of the nucleotide sequence, wherein the complement contains the same 
number of nucleotides as the nucleotide sequence, and the complement and the nucleotide 
sequence are 100% complementary. 
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6; :i?|?° 1> ? UCle0tide ° f Claim 5 ' wherein ^no acid sequence of the polypeptide and 
toe^no. acidsequence of SEQ ID No. 4 have at least 85%, preferably at least 90%, 
^tionally at least 95%, sequence identity based on the ClustalW alignment method 
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7. The polynucleotide of Claim 5, wherein the nucleotide sequence comprises the nucleotide 
sequence ofSEQ ID No. 3. 

8. The polynucleotide of Claim 5, wherein the polypeptide comprises the amino acid 
sequence of SEQ ID No. 4. 

9. An isolated polynucleotide comprising a nucleotide sequence encoding a polypeptide 
having aspartic endoproteinase activity, wherein the amino acid sequence of the 
polypeptide and the amino acid sequence selected from SEQ ID No. 6 or 8, preferably 
SEQ ID No. 8, have at least 75%, preferably at least 80%, sequence identity based on the 
ClustalW alignment method, or the complement of the nucleotide sequence, wherein the 
complement contains the same number of nucleotides as the nucleotide sequence, and the 
complement and the nucleotide sequence are 100% complementary. 

10. The polynucleotide of Claim 9, wherein the amino acid sequence of the polypeptide and 
the amino acid sequence selected from SEQ ID No. 6 or 8, preferably SEQ ID No. 8, 
have at least 85%, preferably at least 90%, optionally at least 95%, sequence identity 
based on the ClustalW alignment method. 

1 1. The polynucleotide of Claim 9, wherein the nucleotide sequence comprises the nucleotide 
sequence of SEQ ID No. 5 or 7, preferably SEQ ID No. 7. 

12. The polynucleotide of Claim 9, wherein the polypeptide comprises the amino acid 
sequence of SEQ ID No. 6 or 8, preferably SEQ ID No.8. 

13. A vector comprising the polynucleotide of any one of Claims 1 to 12. 

14. A recombinant DNA construct comprising the polynucleotide of any one of Claims 1 to 
12 operably linked to a regulatory sequence. 

15. A method for transforming a cell comprising transfonning a cell with the polynucleotide 
of any one of Claims 1 to 12. 

. 16. A cell comprising the recombinant DNA construct of Claim 14. 
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17. The cell of Claim 16, which is a prokaryotic cell, an eukaryotic cell or a plant cell, 
preferably a coffee cell 

18. A transgenic plant comprising the cell of Claim 16 or 17. 

19. A method for modulating coffee flavour precursor levels in green coffee grains, the 
method comprising introducing into the coffee plant the recombinant DNA construct of 
Claim 14. 
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Abstract 



Modulation of Coffee Fl avour Precursor Levels in Green Coffee Grains 

The present invention relates to isolated polynucleotides encoding cysteine proteinases; 
cysteine proteinase inhibitors; and aspartic endoproteinases. 

The invention also relates to a transformed host cell, preferably a plant cell, in which over- 01 
under- expression of these polynucleotides result in altered levels of coffee flavour precursor 
levels, specifically, amino group-containing molecules such as amino acids, peptides and 
proteins, in green coffee grains. 
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Figure 1: Northern blot analysis of the expression of the cysteine proteinase (CcCPl) gene in 
different tissues of Cqffea arabica. 
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Figure 2: Northern blot analysis of the expression of the cysteine proteinase inhibitor 
(CcCPI-1) gene in different tissues of Cqffea arabica. 
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10 Figure 3: Northern blot analysis of the expression of the cysteine proteinase inhibitor gene 
(CcCPI-1) at different cherry development stages for Coffea arabica (ARA) and Cqffea 
canephora (ROB). 
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25 Figur^ ; 4f\No^theirn blot analysis of the expression of the aspartic proteinase 2 (CcAP2) gene 
■-^different tissues of Cqffea arabica. 
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SEQUENCE LISTING 



SEQUENCE LISTING 



<L10> Tours Nestle Research Center 

Flavour ImpliCati ° n ° f Proteinase And Proteinase Inhibitor in Coffee 
<130> patent Proteinase and Proteinase inhibitor coffee 



<160> 8 



<170> patentin version 3.1 

<210> 1 

<211> 1543 

<212> DNA 

<213> Coffea canephora 



<220> 

<221> mRNA 

<222> (1)..(1543) 

<223> 



<220> 

<221> CDS 

<222> (122) . . (1315) 

<223> > »^ 

<400> 1 

aagcagtggt aacaacgcag agtacgcggg ggacactcct ccccgttcca ttccagacca 
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gggtccaaaa ccaccgtcca agagaggagc agactgcaga gtgatacata caggcacaaa 120 

9 ss? ss? S3 as a? a; as ss S3 a as 3s as a as is 169 

1 5 10 15 

acc etc tta tec tgc gca etc ate tct tea acc act ttc caa cat aaa :>i7 
Thr Leu Leu ser cys Ala Leu He ser ser Thr Thr Ke Gl£ Sis SS 217 
^° 25 30 

^ £ a 9 tat cga gta caa gac ccg tta atg ata cqc caa ate acc oar 
He Gin Tyr Arg val Gin Asp PrS Leu Met ill A ?g 8ln ?al ?hr Sp 265 
as 40 45 K 

Acn S?c S!* S?« f?c cac cac cca ggt agg tct tct gca aac cat cgt 313 
Asn His His His Arg His Hs Pro Gly Arg ser ser Ala Asn His Arg 
3U 55 60 

20 ss ss ss as as ss s?s §s? ss as ss ss « sa its as 361 

70 75 80 

tac gag aaa act tac tct acg cac gag qaq tac ata car rnr rtn «r»^ vino 
25 Tyr Glu Lys Thr Tyr ser Thr His §lS Gl§ Tyr 8a? Sis Arg Leg If? 409 



10 



15 



30 



35 



45 



55 



90 3 95 



205 



2*2 5 9t 9at tta aaa 9 aa aaa gat gac tqt aat aat aaa tar 
His Met cys Asp Leu Lys Glu Lys Asp Asp c|s Bp £p 8?y §£ ler 

«u 235 240 

65 w a? a ss as a;? as a< as s ss ss as as a? ss 

250 ?cc 



457 



?? t H5 C g f c aag aac ctc atc aa 9 Qcc gcg qaq cac caa arc am nar 
He Phe Ala Lys Asn Leu He LyI Xla III ifg Sis Gin* Ala Se? Eg 
xUU 105 HO 

ss is in ss ss ss ss? ss as is? ss ss as a?s as 505 

x " 120 125 

as as as as as? s? ss? gs ss ss a* a? as a? ss? as 553 

13 5 140 

40 a| as % as a a? ss s; ss a? as as ss as as as 601 

150 155 160 

§3 SS? SS? SS §? S? SS S3 SS as sir SS SS 3? 35 85 649 

xb:> 170 175 

ss as as g| as as ?a ss » as a? 23 3s ss ss? 3s 697 

xOU 185 190 ' 

as as gs ss? s as ss as ss ss ss as iss ss ss as 745 



SS § Si £2 SS ISS ESS SS? SS SR 38 SS 83 SS 31 SS 793 



215 220 

60 Sis 53 c?4 En fvt ?2 a S*S «5 5S5 ?at gat gga tgc tec 841 

ser 
240 

}t 889 

250 — 25! " 

5fi ffi §12 1| «H S5 Sf c pfo c m B fl» K SS ?? y ff» $ 937 

• " 265 270 



29 



ftl ll c aat £ ct S? g aaa gt ? 9f9 9*9 aaa 9tg c 99 aat ttc gca aaa 985 

Lys phe Asn pro Glu Lys Val Ala Val Lys Val Arg Asn Phe Ala Lys 

275 280 285 

ate cct gag gat gag agt caa att get gec aat gta gtg cat aat ggc 1033 

290 ASP 295 11 6 AU AS " 300 Hl ' S AS * 

10 ccg ctt get att gga ttg aat gcg gta ttc atg caa act tac ate gqq 1081 

Pro Leu Ala He Gly Leu Asn Ala Val Phe Met Gin Thr Tyr He G?y 

305 310 . 315 320 

ggt gtg tea tgt cct ctt att tgt gac aaa aag agg ate aac cat qqt 1129 

15 Gly vaT ser cys Pro Leu He Cys Asp Lys Lys Arg He Asn His G?y 

325 330 335 ' 

8!S ?* x ? zt ?, tg g ? c tat 99* tct aga ggc ttc tea ate ctt agg ctt 1177 

val Leu Leu VaT Gly Tyr Gly Ser Arg Gly Phe Ser lie Leu Arg Leu 



25 



30 



40 



50 



65 



on ' 7 ' ~ j ■ "ijr «' y «iy rue ser xie ueu 

*V 340 345 350 

ggc tac aag cca tac tgg att ate aag aac tea tgg ggg aag cgt tag 1225 

Gty Tyr Lys Pro Tyr Trp He lie Lys Asn ser Trp Gly LyI Arg Trp 

355 360 365 

ggc gaa cat ggt tgc tac egg ctt tgt cga ggg cac aac atg tqt qaa 

GTy Glu ms Gty cys Tyr Arg Leu cys Arg GTy His Asn Met cys G?y 



<210> 2 

45 <211> 397 

<212> PRT 

<213> coffea canephora 



<400> 2 



55 Met Met Met Thr Ser Gly Gly Leu Met Leu Thr cys Thr Leu Ala He 
x 5 10 15 

Thr Leu Leu Ser cys Ala Leu He Ser ser Thr Thr Phe Gin His Glu 
w 2° 25 30 



He Gin Tyr Arg val Gin Asp Pro Leu Met He Arg Gin val Thr Asp 

So 40 45 r 

Asn His His His Arg His His Pro Gly Arg Ser ser Ala Asn His Arg 

50 55 60 



1273 



atg age aca atg gtt tea get gtg gtg aca cag acc tct taa i^ii 

Met ser Thr Met val Ser Ala VaT VaT Thr Gin Thr Ser 9 315 
385 390 395 

35 taccaaaaca tctctgctct tcagaggttg tatacaaggt ggtttgctct tggaagatct 1375 

tatcatgttt tcgaaatatt taggtttgta taatatgaag ggtagagagt aataagaacc 1435 

aaacaaaagt teaggectgt ttctgatagg aatggaatat gateggagtc atttgttact 1495 

ggatcacaaa aaaaaatcca aaaaaaaaaa aaaaaaaaaa aaaaaaaa 1543 



30 



Leu Leu Gly Thr Thr Thr Glu Val His Phe Lys Sep Phe Val Glu Glu 



75 80 
Tyr Glu Lys Thr Tyr sen Thr His Glu Glu Tyr Val His Arg Leu Gly 



95 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



He Phe Ala Lys Asn Leu He Lys Ala Ala Glu His Gin Ala Met Asp 



110 



Pro ser Ala He His Gly val Thr Gin Phe ser Asp Ley Thr Glu Glu 



120 125 
Glu Phe Glu Ala Thr Tyr Met Gly Leu Lys Gly Gl^ Ala Gly val Gl; 



140 



Gly Thr Thr Gin Leu Gly Lys Asp Asp Gly Asp Glu Ser Ala Ala Glu 

A:>u 155 160 

val Met Met Asp Val Ser Asp Leu Pro Glu ser Phe Asp Trp Arg Glu 
- LD:> 170 175 

Lys Gly Ala val Thr Glu Val Lys Thr Gin Gly Arg cys Gly ser Cys 

185 190 

Trp Ala Phe ser Thr Thr Gly Ala lie Glu Gly Ala Asn Phe He Ala 
33 200 205 

Thr Gly L ys Leu Leu Ser Leu ser Glu Gin Gin Leu Val Asp Cys Asp 

Hi| Met cys Asp Leu Lys Glu Lys Asp Asp cys Asp Asp Gly cys ser 

235 240 

Gly Gly Leu Met Thr Thr Ala Phe Asn Tyr Leu He Glu Ala Gly Gly 

D 250 255 

He Glu Glu Glu val Thr Tyr Pro Tgr Thr Gly Lys Arg Gly Glu cys 

Lys Phe Asn Pro Glu Lys val Ala val Lys Val Arg Asn Phe Ala Lys 

He Pro Glu Asp Glu Ser Gin He Ala Ala Asn Val val His Asn Gly 

^ y -> 300 

Pro Leu Ala lie Gly Leu Asn Ala val Phe Met Gin Thr Tyr lie Gly 

3XU 315 320 

Gly val ser cys Pro Leu He cys Asp l^s Lys Arg He Asn His Gly 
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vai Leu Leu Val Gly Tyr Gly Ser Arg Gly phe Ser He Leu Arg Leu 
340 345 350 

> Gly Tyr Lys Pro Tyr Trp He He Lys Asn Ser Trp Gly Lys Arg Trp 
355 360 365 

1ft Gly fin His Gly Cys Tyr A S g Leu Cys Arg Glv His As " Met Cys Gly 

AU 375 380 



Met ser Thr Met Vai Ser Ala Vai vai Thr Gin Thr ser 
385 390 395 

<210> 3 

<211> 726 

<212> DNA 

<213> coffea canephora 
<220> 

<221> mRNA 

<222> (1) . . (726) 
<223> 

<220> 

<221> CDS 
<222> (79) . . (498) 
<223> 

<400> 3 

ggcgcaacaa acattgaaag aaaatcaaga acccaaaaaa accccacaag aaaaaaagaa 60 

50 aaagaagaag aaaagcca atg gca aaa cca teg tea tct eta etc aca ctt 111 

Met Ala Lys Pro Ser ser ser Leu Leu Thr Leu 
1 5 10 

S ct l cc t f t ctt ctg atc t-tt tt; c att ctt gca eta ttt tec acc acc isq 
55 Pro ser Phe Leu Leu lie Phe Phe He Leu Ala Leu Phe Ser Thr Thr 
15 20 25 

llu rin %li !!« 2if } t9 2 ga agg aaa gtg gga gca agg gag aag att 207 
6Q Leu Gin Val Asn Ala Leu Gly Arg Lys VaT Gly Ala Arg Glu Lys lie 



15 



20 



25 



30 



35 



40 



45 



40 



70 75 



255 



2iR ? at S tg aag agc aac aaa g aa gtt caa gaa ctt ggg gaa tat tgt 
Glu Asp vat Lys Ser Asn Lys Glu val Gin Glu Leu cTy Glu Tyr cys 

65 5 50 55 

f£i 5 ct S!? 9 tac aac aag a 9 t tt: 9 egg aag aag aac aac gaa agt gqt 303 
val ser Glu Tyr Asn Lys ser Leu Arg Lys Lys Asn Asn Glu sir G°y 3 



32 



HJ S3 SB IS SS ?S? S 83 83 Kg jts g?g 51 <?* «a gjj ^ 

BU 85 90 

5 HI SS g: 5f gf 25 51 IS 58 ffi a* ? h « t« « w 399 
.o 83 SS §| 83 5f SS IS K 83 83 83 2? £ 83 S?S 447 

115 120 

I5 « || SS ft §K SSS £2 as £ I" SS £? K SS ?ff 51 495 

tga agaagaaaat gttgaaaaag ttggaactgt ttgggagatc taatctgatg 548 
2Q attattagta cctttcagtg caaattctct ttgctgttaa gtgttcggtt tttttttttt 608 
ccctgtgtct atttatgacc gtggtcatga tgatatggtg tatgatccag taataattaa 668 
aatctgttgc ataaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaa 726 
<210> 4 
<211> 139 
30 <212> prt 

<213> Coffea canephora 



25 



35 



45 



50 



<400> 4 



Met Ala Lys Pro Ser Ser Sep Leu Leu Thr Leu Pro ser Phe Leu Leu 
40 3 10 15 

ne Phe Phe lie Leu Ala Leu Phe Ser Thr Thr Leu Gin val Asn Ala 

" 30 

Leu Gly Arg Lys Val Gly Ala Arg Glu Lys He Glu Asp val Lys ser 

40 45 

Asn Lys Glu val Gin Glu Leu Gly Glu Tyr cys val ser Glu Tyr Asn 

00 60 



55 65 S SGr L6U Arg LyS k% S Asn Asn Glu s er Gly Ala Pro He He Phe 

75 80 

^ Thr ser val val Glu Ala Glu Lys Gin Val val Ala Gly He Lys Tyr 

Tyr Leu Lys lie Lys Ala Thr Thr Ser ser Gly val Pro Lys Val Tyr 

xui 110 

65 

Asp Ala lie val va! Val Arg Pro Trp val His Thr Lys Pro Arg Gin 



33 



Leu Leu Asn Phe Ser Pro ser Pro Ala Thr Lys 
130 135 

5 <210> 5 

<211> 2282 

<212> DNA 

<213> coffea canephora 



15 <220> 

<221> mRNA 

20 <222> C1) ' * C2282:> 
<223> 

25 <220> 

<221> CDS 

30 <222> C439) ' " C1731) 
<223> 



40 



35 <400> 5 

actcactata ctttgcattc tcttcaccat tctccctcaa aactccctcc aacattcttt 60 

tccttggttt tttcatctat ccctcctata aaaatcgatt attttgttct tttacctctt 120 

aaaaatccat tcttggaatt catttatcca tatacaccat acttgtgcat gtcccttttg 180 

gttgttttgc ttttgtgata agtaattgtt ggtttattgg tttttcatga tggctccgga 240 

45 tctaagaaga aatgggtcgg tagtagcttt agccctgtta gtctctctgg ttgttaatgg 300 

tgttattttt gatgtagaag gtaacaataa tgtggttttt gaggtggaac ataaatttaa 360 

agggagaagg aatgagaatg gaggaagagg gtctttttga cttcactcaa ggctcatgat 420 



50 



65 



45 50 

agt ctt ggt att gac ttg act 
Ser Leu Gly lie Asp Leu Thr 
60 65 



gca gcc 
Ala Ala 


ctt 
Leu 
5 


gac atg 
Asp Met 


cct 
pro 


ttg 
Leu 


9gt ggc 
Gly Gly 
10 


gcg etc tat 
Ala Leu Tyr 
20 


ttc act 
Phe Thr 


aag 
Lys 


ctt 
Leu 
25 


teg 
Ser 


att 
He 


tat gtg 
Tyr val 


caa 
Gin 


gtg 

val 


gat 
Asp 


aca gga 
Thr Gly 
40 


agt 
Ser 


gac 
Asp 


ggt tgt 
Gly cys 


gtc 
val 


aga 
Arg 


tgc 
cys 
55 


ccc 
Pro 


aag 
Lys 


aaa 
t-ys 


age 
ser 


eta tat gac 
Leu Tyr Asp 


atg 
Met 
70 


aaa 
Lys 


gcc 
Ala 


tec 
Ser 


age 
ser 


acc 
Thr 
75 



471 



55 Asn Gfy Ser Pro Thr Xtp %X 2£ S5 ctt teg att 519 

6Q H? «? So Pro JBJ T yr Tyr W Gin Sa? ?fp ?£? I?? 5S SJ 567 



615 



663 



34 



105 



25 



va, asp G ,y i, e lie Gly Phe cTy Gin Ala Asn 
±w 165 170 

IS B? IB IB B? as BS SS g JS if? 51 S3 g| - g- 



55 



^ 300 — " S 0 5 Vdl va ' ier Pfle H" Phe Glu Asp ser Ceu 

60 310 315 



65 



711 



r?S f" 8*? a 5 t tgt 9 at caa Sac ttt tgc ttg tct aca ttc aar 

Arg Leu Val JS P CYS ASP 61 n Asp P * e Le " s2 All Phe aS 
ou 85 90 

*S SS BS B? Sp $ Si S3 If? SS SI SI SB B? B? 83 759 



io *? B? if? £ If? 11? S BS i?5 if? B? SS S3 « SS B? ™ 

15 at g ss a; i?„ as if? as bs as b; ib ss 3 - «s 

i| b? a: §3 |?| $ s e sc sr s if? m a if? 903 

20 ' XH:> 150 155 

e s s 8R ?s ?3 is §f? sr jfs gf y as r % k as »» 



999 



30 SB S SB % 22 SB If? Iff SS If? If? If? IB SB SB IB 1047 

xys 200 

35 If? as ?3 ?3 SR SB gS 2? 52 IS? IS SS B2 ?3 % J2 1095 
111 ? " s?l ~ SS I «H 21 SS )S IB ffi S3 If? If? SI 



1191 



?3 £2 SS SS Pro B? SS S3 2 If? If? If? Br If? IS if? 

245 250 

45 Br IB IB fj II? if? B? SS ||g SB B? BS SS fj SS ?3 1239 
50 B? B? S™ SS S3 i?S 52 IB B? SB BS IB B? S BS 51 1287 



260 265 

acg gca tec caa tec 
Tnr Ala ser Gin ser 
275 280 

IB SB IB S3 HS SS HI BS 58 ® Be S3 B? II? if? BS 1335 
S3 BS ?S if? SB pS| S3 S3 B? SB SS BS IB SS B? 2S 1383 



1431 



B? B2 IS S3 1? SSS SB IB B? BS SB SS BS SB SS BS 

325 330 
SIS SS c?s Hi gfy S? STS Asn ??§ §?y » SK » |?| Q fl- "79 
agg gaa gta act ctt ttg gga gat ctt gta etc gca aac aag ctt gtt 



1527 



35 



<210> 6 
<211> 430 
<212> prt 

<213> coffea canephora 
<400> 6 

Met Leu Ala Ala Leu Asp Met Pro Leu Gly Gly Asn Gly ser Pro Thr 

10 15 

Asp Ala Ala Leu Tyr Phe Thr Lys Leu Ser lie Gly Thr Pro Pro Gin 



30 



Asp Tyr Tyr Val Gin val Asp Thr Gly ser Asp He Leu Trp val Asn 



45 



cys Ala Gly cys val Arg cys Pro Lys Lys ser ser Leu Gly He Asp 



1575 



Arg Glu val Thr Leu Leu Gly Asp Leu Val Leu Ala Asn Lys Leu Val 
B jg asp S If S S fR « fl» & p |f| ffi « - £ 

IIS S * 5G » 28 « £ gfg !5S S R £ ffl Sf JG 

390 395 

83 «. S SB K SC S IS 3K ffi ??5 32 S ; jrj «• «. »i 

«k a as § a 22 a as as « s 22 a sg k a — 

22 lie Pro t9a acatttaaaa tcatactagc tgagaaggag gcattatgat 
430 

agcgtaccat ggtactcata gtgatcaggc atcttgctga ttctttggac cattataatt 
tctcatgtgt ttaaagtgaa agtcagttcg tcgagacatc ttgtgactcc ataatcttct 
tgatcaagct gaactctact cacaaaacca tagctaattc ttttgatctc aaaagagaaa 
taggctctgc aaaaggattt cggaggttga tgttgaacat tcttcttatt tggatgttat 
tgatacccca gatgattaag gaaagcctat aggaaacaga tggtgggaag gagtatacat 
tctttctgac tctttggaac ttcctagcgt atacacatat ttcacacgga atgtatctta 
taattcatct gttctttctg tttattgtca acttgtttca aatgattgga gtagctgcaa 
taatcaactc ggatggtggt tcatgcttaa ggctcgtctt gcctcattgt taagacgtga 
aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa a 



1771 



1831 
1891 
1951 
2011 
2071 
2131 
2191 
2251 
2282 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Leu Thr Leu Tyr Asp Met Lys Ala Ser ser Thr Gly Arg Leu Val Thr 

70 75 80 



cys Asp Gin Asp Phe cys Leu Ser Ala Phe Asn Ala Pro Ala ser Asp 



90 



95 



cys Lys Val Gly Asn Pro cys Ala Tyr Ser Val Thr Tyr Gly^ Asp Gly 



105 



110 



ser ser Thr Gly Gly Tyr Phe Val Arg Asp Tyr Ala Lys Leu Asn Gin 

120 125 

Leu Thr Gly Asn Leu Gin Thr lie Pro Met Asn Gly ser He val Phe 

135 140 

Gly cys ser Ser Gin Gin ser Gly Glu Leu Gly ser ser Thr Glu Ala 



155 



160 



Val Asp Gly He lie Gly Phe Gly Gin Ala Asn ser Ser He He Ser 
165 170 175 

Gin Leu Ala ser Ala Gly Lys val Lys Lys He Phe Ser His cys Leu 



190 



Asp Gly il| Asn Gly Gly Gly lie Phe Ala He Gly Gin val val Gin 



200 



205 



Pro Lys Leu Lys Thr Thr Pro Leu Val Pro Asn Glu Ala His Tyr Asn 



220 



val val Leu Asn Ala lie Glu Val Gly Gly Asp val Leu Asn Leu 



230 — ifs 



Pro 
240 



Ser Asp val Leu Gly Gly Gly ser Gly ser Gly Thr He lie Asp se 



250 



25 



Gly Thr Thr Leu Ala Tyr Leu Pro Asp Asp val Tyr Thr Pro 



26 



270 



Leu Met 



Glu Lys lie Thr Ala Ser Gin ser Asn Leu Lys He His lie Val Glu 



280 



285 



Asn Gin Phe Lys cys Phe val Tyr Ser Gly Asn yal Asp Asp Gly Phe 



300 



Pro val val ser Phe His Phe Glu Asp ser Leu Ser Leu Thr val Tyr 



315 



320 



Pro His Glu Tyr Leu Phe Asp Leu His Asg Asp Gin Trp cys lie Gly 



330 



335 



37 



Trp Gin Asn Lys Gly Met Gin Thr Arg Asp Gly Arg Glu val Thr Leu 

340 345 " 350 

5 Leu Gly Asp Leu val Leu Ala Asn Lys Leu Val Ser Tyr Asp Leu Glu 

355 360 365 

10 AS " Thr 11 6 Gly Trp A I a Glu Tvr Asn C V S ser ser ser He Lys 

v 3/u 375 330 

Leu Arg Asp Glu Lys Ser Gly Asn Val Tyr Ala val Gly ser His lie 

15 3a * 390 395 400 



20 



30 



40 



50 



He Ser ser Ala Arg Gly Leu Asn Ala Gly Lys Ala Leu Arg Phe Leu 
405 410 415 



Leu Leu lie lie Thr ser Leu Leu His Ala Leu Leu lie Pro 
420 425 430 



25 <210> 7 

<211> 1819 

<212> DNA 

<213> coffea canephora 



35 <220> 

<221> mRNA 
<222> (1) . . (1732) 



<223> 



45 <220> 

<221> CDS 
<222> (79) . . (1602) 



<223> 



55 <400> 7 

cttactgact ctcgtatatt attcaatcta tcttttgagt tttgcaagag cccatcaagc 60 

atcaaggcat aaccaacg atg gag agg agg tac ctt tgg gca gca ttt gta ill 
60 !? et Glu Ap 9 Arg Tyr Leu Trp Ala Ala Phi val 11 

1 5 10 

tta ggg gcg att gtg tgt tct eta ttt cct ctt cct tct oaa ona tta icq 
Leu Gly Ala lie VaT cys Ser Leu Phe Pro 22 Prl ttr llS I?? "2 159 
65 15 20 25 

tti l^i Til li c P tg aaa aaa aaa ccc tta 9at att caa age ata aga 207 
Lys Arg lie Ser Leu Lys Lys Lys Pro Leu Asp He Gin ier lie A?g 
5{J 35 40 



38 



ffi § 3? £ bs as 22 m is £ sr gp £ g» 

K «« £ £ £ gS H$ £ £ JK |?g £ Sfi £ £ ffi 



75 



51 as 5? a ff Hs as £ £ « as £ a? £ aj is? 

85 90 

£ £ 3S gl £ IS? £ SB £ £ £ ^ jg s ffi «« 
£ 83 £ IS SS 5! £ £ % £ £ £ £ £ 25 Sfi 

xx i 120 

£ §| £ 58 £ 5i 3j £ £ IS? £ IS? £ £ - 
£ £ £ lie £ §j g? y £ H- £ g| « I?? £ £ £ 

150 155 

as £ ss S3 ij £ a; £ 25 f k? 3? ss as ss? £ 

• LW 165 170 

sr as jr £ a- as a? is as is? a* ? s a* ^ «, 

/:> 180 I85 

£ a? §1 25 g? y 25 a? £ as as as £ ?i? 215 ss si? 

xy5 200 

S3 £ £ £ £ £ £ St £ £ if y 25 8? 115 g?S as 
S3 £ £ £ ft 25 £ £ £ £ s 85 gfS £ g?? gf y 

230 235 

g?S 25 £ £ gg flt 8H £ IS? £ ai £ 38 ^ 51 Sfi 



250 



IS? £ S3 £ £ £ as 31 fg £ ISg a? £ 3f £ g? y 



265 



£ £ £ £ g? y £ £ £ IS? g? y £ £ gg g ?y gg g 
£ £ £ S3 £ £ g| IS? IS? 2S 25 £ £ SSS £ £ 

£ si? £ a? a* £ as £ £ a? £ Z g ?y ? « ^ 

3V:> 310 315 

act gaa tgt aaa gaa att gtt tea cag tat ggt gaa ctg att tgg gat 

39 



Thr Glu cys Lys Glu He Val ser Gin Tyr Gly Glu Leu He Trp Asp 
320 325 330 



Leu Leu S3 ser 1?S S3 £2 S£ ffi SJ S3 gj gj §S fij i?J 



etc etc 

Va1 335 "* — 345 ~" gig 

tta tgt ccc ctt cgt ggt get cag cat gag aat act tar at-r aan tra 
Leu cys Pro Leu Arg Gly Xla GlJ His SlS a" ?la ?y? lie gg IS 
330 355 360 

vai 83i ? ac 2? g i? 9 aac aa9 9?9 gaa get tct gtt ggt gaa tec cca 
Val val Asp Glu Glu Asn L^s Glu Glu Ala Ser Val Gly Glu ser Pro 

375 

atg tgt act get tgt gaa atg get gtt gtt tgq ata caa aac ran rtn 
Met cys Thr Ala Cys Glu Met Xla Sal vVl t$ S3 §?n lln Glf? 28 
ov ^Bi 390 395 

fvl ri^ r?2 £ ga . aag S? 9 aaa gta ctt fl ca tat gtg aat cag ctt 
Lys Gin Gin GTy Thr Lys Glu Lys vaT Leu Ala Tyr Val Asn Gin Leu 
**uu 405 41 q 

tgt gaa age ata cca agt ccc atg gga gaa tec ate att aac tor Mr 
cys Glu ser lie Pro ser Pro Me? lly hu ser lie fie jgp gs Itn 

420 425 

li^ } ta ^ 3 CC a E c ctg cca aat 9tt tea ttc acc ate gga aaa aaa aat 
ser Leu ser Thr Leu Pro Asn val ser Phe Thr He G?y g?y tyt Ur 

435 440 

« Iff S3 *? S 5J fflj fcj S3 ffi 3; fj» g? a § aa g? c t« 

4b0 455 

9?* 2? a 8*5 tgc a J c a 9 t 99 a ttc at 9 Oct atg gat ata cca cca rrt 
Ala Glu val cys He Sen cTy Phe Me? £la Me? Sj 83 Sg Pr§ PrS 

4D:> 470 475 

S?5 «J S3 !R 38 83 28 if? 22 SI? SS S3 1?? «f ~ <?< 

4 * u 485 490 

« 83 SS g 5f SS 55 || S3 H? Hi 2S »j JK K 

tag acaagactgt ttatttcgtc tactgtttga eggtcctaag agaagctatg 
aagacatgta gtagcttgta aattaggatt taattatget tggctggttt atgggtggtg 
cttttaatat tatatgtaat gtaagcagat atgttacctt gttttagagt ttcaaggaaa 
ctgeaatatt tacttceggt aaaaaaaaaa aaaaaaaaaa aaaaaaa 

<210> 8 
<211> 507 
<212> prt 

<213> coffea canephora 
<400> 8 

Met Glu Arg Arg Tyr Leu Trp Ala Ala Phe val Leu Gly Ala He val 
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10 



10- 



15 



25 



30 



35 



50 



55 
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cys ser Leu Phe Pro Leu Pro ser Glu Gly Leu Lys Arg He Ser Leu 
* u 25 30 



Lys Lys Lys Pro Leu Asp He Gin Ser He Arg Ala Ala Lys Leu Ala 

His Leu Glu ser Thr His Gly Ala Gly Arg Lys Glu Met Asp Asn Asn 
w " 60 

Leu Gly ser ser Asn Glu Asp He Leu Pro Leu Lys Asn Tyr Leu Asp 



20 Ala Gin Tyr Tyr Gly Glu lie Gly He Gly Thr Pro Pro Gin 



90 



80 



Lys Phe 
95 



Thr val He Phe Asp Thr Gly ser ser Asn Leu Trp val Pro Ser Ala 



105 



110 



Lys cys Tyr Phe ser He Ala Cys Trp Leu His ser Lys Tyr Lys Al; 

12 5 



Lys Lys ser ser Thr Tyr Thr Ala He Gly Lys ser cys ser He Arg 

T^r Gly ser Gly ser lie ser Gly Phe Ser Ser Gin Asp Asn Val Glu 

■° u !55 160 

40 val Gly Asp Leu Val val Lys Asp Gin Val Phe He Glu Ala Ser Arg 

-LOJ JJQ 

45 Glu Gly ser Leu Thr Phe val He Ala Lys Phe Asp Gly He Leu Gly 
AOU ^ 5 190 * 

Leu Gly Phe Gin Glu He Ala val Asp Asn Met val Pro val Trp Tyr 



205 



Asn Met val Asp Gin Gly Leu val Asp Glu Gin val Phe Ser Phe Trp 

220 



225 AS " ASP Pr ° A |8 Ala Glu AS P G1 X f]y Glu Leu Val Phe Gly 

60 Gly val Asp Thr Asn His Phe Lys Gly Lys His Thr Tyr val Pro Val 

250 255 

65 Thr Gin Lys Gly Tyr Trp Gin Phe Lys Met Gly Asp Phe Leu He Gly 

265 270 

Asn val ser Thr Gly Phe cys Glu Gly Gly cys Ala Ala lie Val Asp 



285 
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Ser Gig Thr ser Leu Leu Ala Gly Pro Thr Thr Val val Thr Gin il. 



295 



300 



Asn His Ala He Gly Ala Glu Gly val val Ser Thr Glu cys Lys Glu 



310 



315 



320 



He val Ser Gin Tyr Gly Glu Leu He Trp Asp Leu Leu val ser Gly 
325 330 335 y 

Val Leu Pro Asp Arg val Cys Lys Gin Ala Gly Leu Cys Pro Leu Arq 
aiu 345 --~ 



350 



Gly Ala Gin His Glu Asn Ala Tyr He Lys Ser Val val Asp Glu Glu 
333 360 365 

Asn Lys Glu Glu Ala Ser val Gly Glu Ser Pro Met Cys Thr Ala Cys 
3/u 3'5 380 

Glu Met Ala val val Trp Met Gin Asn Gin Leu Lys Gin Gin Gly Thr 



395 



400 



Lys Glu Lys val Leu Ala Tyr val Asn Gin Leu cys Glu ser He Pro 
w> 410 415 

ser Pro Met Gly Glu Ser lie lie Asp Cys Asn ser Leu Ser Thr Leu 

47 S Aon 
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Pro Asn val ser Phe Thr He Gly Gly Lys Ser Phe Glu Leu Thr Leu 



440 445 
Lys Glu Tyr val Leu Arg Thr Gly Glu Gly Phe Ala Glu val cys il 



455 4 6 o 

ser Gly Phe Met Ala Met Asp val Pro Pro Pro Arg Gly Pro He Trp 

* /0 475 480 

val Leu Gly Asp val Phe Met Gly val Tyr His Thr val Phe Asp Tyr 

Gly Asn Leu Arg Met Gly phe Ala Arg Ala Ala 
500 505 
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